Talk by Steven Brumby from Decartes Labs. Given to the Redwood Center for Theoretical Neuroscience at UC Berkeley.
Abstract The proliferation of transistors has increased the performance of computing systems by over a factor of a million in the past 30 years, and is also dramatically increasing the amount of data in existence, driving improvements in sensor, communication and storage technology. Multi-decadal Earth and planetary remote sensing global datasets at the petabyte scale (8×10^15 bits) are now available in commercial clouds, and new satellite constellations are planning to generate petabytes of images per year, providing daily global coverage at a few meters per pixel. Cloud storage with adjacent high-bandwidth compute, combined with recent advances in neuroscience-inspired machine learning for computer vision, is enabling understanding of the world at a scale and at a level of granularity never before feasible.
We report here on a computation processing over a petabyte of compressed raw data from 2.8 quadrillion pixels (2.8 petapixels) acquired by the US Landsat and MODIS programs over the past 40 years. Using commodity cloud computing resources, we convert the imagery to a calibrated, georeferenced, multiresolution tiled format suited for machine-learning analysis. We believe ours is the first application to process, in less than a day, on generally available resources, over a petabyte of scientific image data. We report on work using this reprocessed dataset for experiments demonstrating country-scale food production monitoring, an indicator for famine early warning.